Speech-synthesis system

ABSTRACT

The present invention relates to speech synthesis system wherein the acoustic message items are optically stored. The system according to the invention comprises: an optical store including a plurality of diffractive elements pertaining to the holographic technique. Each element is constructed for selectively projecting images carrying acoustic information in optical form; each projected image being coverted into electrical speech signals through an image converter tube associated with a loudspeaker. The image selection and its analysis are controlled by adresses supplied from a computer.

J m wiirieu man I a 1.1. a

Spitz et al.

SPEECH-SYNTHESIS SYSTEM {75] Inventors: Erich Spitz; Michel Rembault, both of Paris, France [731 Assignee: Th0mson-CSF, Paris, France [22} Filed: Jan. 30, 1974 {2i} Appl. No.: 437,873

Related [1.8. Application Data [63] Continuation of Scr, No, 237,686, March 24, I972,-

abandoned.

[30] Foreign Application Priority Data Apr. 2, l97l France 71.1 I741 [52] US. Cl 179/1 SA [5|] Int. Cl. G101 1/00, 61 1C 13/04 [58] Field of Search 179/1 SA, 1 SM, 1 SP, l VS, 179/1 VC, 1 SF, 1 56; 340/173 R; 35/35 R [56] References Cited UNITED STATES PATENTS 3,114,980 12/1963 Davis 35/35 OTHER PUBLICATIQNS \"itols. V. A.. Hologram Memory," IBM Tech. Discommie DEVICE 32 COMPUTER i AMPLlFIER 25 I 5/ \-L V closure Bulletin, Vol. 8, No. 11, April 1966.

Primary E.\'an11'11crWilliam C. Cooper Assistant Examiner-E. S. Kemeny Attorney, Agent, or Firm-Cushman, Darby and Cushman [57] ABSTRACT The present invention relates to speech synthesis s \'stcm wherein the acoustic message items are optically stored.

The system according to the invention comprises: an optical store including a plurality 'of diffractive I elements pertaining to the holographic technique. ,Each element is constructed for selectively projecting images carrying acoustic information in optical form; each projected image being coverted into electrical -speech signals through an image converter tube associated with a loudspeaker. The image selection and its analysis are controlled by adresses supplied from a computer.

-4 Claims, 5 Drawing Figures wUDsPEAKtR PATENTEDHAR 41975 SHEET 1 0f 5 fivzwamaz \EEESE mm nzzou VG m PATENTEDHAR 71.869575 SHEET 3 0F 5 'LOUDSPEAXKER SPEECH-SYNTHESIS SYSTEM This application is a continuation of our copending I application, Ser. No. 237,686, filed Mar. 24, 1972, now abandoned.

The present invention relates to systems designed to effect synthesis of acoustic messages and more particularly to utilisation in said systems of a fast-access optical store with, a high storage capacity.

One of the numerous examples of application of these devices is illustrated by conversion units which enable the electrical data supplied by a computer to be transformed into acoustic signals simulating speech.

The acoustic messages synthesised by the above mentioned units are generally developed from words or phonetic elements previously stored and reconstituted in a predetermined order under the control of a computer program.

In variant embodiments, the vocabulary takes the form of numerical data stored in a direct-access memcry of the computer, the conversion unit transforming these data into acoustic signals generated by a multichannel synthesiser of the vocoder type. Unfortunately, the storage capacities of these systems rapidly become prohibitive. 4

In other embodiments, the elements of the acoustic message are recorded in analog form in order to make them directly accessible to read-out. The data carrier may, for example, be a magnetic drum, a disc or a film. The difficulty of reducing the access time to the message elements, is one of the chief drawbacks of these systems.

The object of the present invention is to overcome these drawbacks by using the holographic technique as a means of storage of the acoustic signals or simply as a means of rapid access to the stored information.

Speech synthesis system for synthesising acoustic messages made of a succession of elementary acoustic signals under the control of information data including adresses for each of said acoustic signals; said synthesis system comprising: holographic storage means including a plurality of elementary fringes patterns arranged side by side on a support, illuminating means controlled by said adresses for directing toward anyone of said fringes patterns a beam of monochromatic radiant energy; image analyser means having an input face positioned for receiving from anyone of said fringes patterns a reconstructed image of said acoustic signals upon illumination thereof by said beam, and electroacoustical means having an electrical input coupled to the electrical output of said image analyser means for delivering said acoustic messages.

In a better understanding of the invention and to show how the same may be carried into effect reference will be made to the following description and the appended drawings among which:

FIG. I is a functional diagram illustrating the various components of a speech-synthesis device in accordance with the invention.

FIGS. 2 and 3 are first and second examples ofa system in accordance with the invention.

FIGS. 4 and 5- are explanatory illustrations.

As shown in FIG. 1, a storage block 1 contains in optical form all the information items required for speech synthesis. In accordance with the invention and as described in more detail in relation to the following tigures,-thc storage block 1 comprises a plurality of diffractive elements of holographic type; each diffractive element is designed to reconstitute by projection of an image, an elementary form of the acoustic message and is defined by its position i.e. by its address.

The sequence of information items which,juxtaposed in time, form the desired acoustic message, is constructed from a program supplied by a computer 2.

The addresses produced by the computer 2 control scanning means 3 which are designed to selectively illuminate the various regions ofthe store I which give access to the various information items. When the store 1 receives localised illumination from the scanned beam source 3, it projects an image onto the entry face of an optical-electrical analyser 6; the electrical signal produced by the analyser 6 in response to the received image, is applied to electro-acoustic means comprising an audio amplifier 9 and a loudspeaker 10.

The acoustic information to be stored for the purpose of subsequent reconstitution and processing by the synthesiser of FIG. 1, comes from a sampling operation of those messages which it is desired to synthesise: The samples are, for example, the words of a dictionary.

A more elaborate solution consists in employing more elementary samples such as phonemes, or phonetic elements which are constituted by the association of at least two phonemes.

ln-order to facilitate the following description, these chosen samples will be phonemes but it should be clearly understood that any other method of sampling could equally well be selected.

Each of the phonemes represents an item of the acoustic message. By known techniques of optical recording of sound, it is stored in the form of an optical signal.

Each optical signal representing a phoneme can be constituted by a track of non-uniform transparency recorded upon a photographic substrate. The optical store in accordance with the invention, is constructed from a set of transparencies carrying for example 30 characteristic phonemes required for the synthesis of the acoustic messages.

FIG. 2 provides a schematic illustration of a first variant embodiment of a synthesiser in accordance with the invention. In accordance with this embodiment, the transparencies representing the phonemes are recorded on a substrate 11 in the form of microphotographs 20 arranged in row and column fashion. By use of a known technique, these are associated with a matrix 12 of holographic lenses 22 the latter being designed to project the images of the micro-photographs opposite them, onto the window 13 of an opticalelectrical analyser 14. The analyser 14 may be a vidicon tube. By successively illuminating the pairs formed in each case by image and holographic lens, it is possible to produce on the window 13 of the analyser 14 the sequence of images containing the information corresponding to the synthesis of the acoustic message. The illumination can be by means ofa monochromatic light beam which can be used to successively illuminate the micro-photographs 20 or, again, by means of a matrix of monochromatic light sources. By way of example, the light sources are constituted by photo-emissive diodes connected to a supply source 18 through the medium of switching devices 16 and 17 respectively referenced feed X and feed Y.

The computer 19 supplies lighting commands to the X and Y feeds i6 and I7, in accordance with a previously determined program. For example, if at a given instant, during a synthesis sequence, the phoneme eor- I responding to the micro-image is to be reconstructed, the computer 19 supplies the command to the feed X 16 to switch to the position b and to the feed Y 17 to switch to the position d; the light radiation 21 emitted by the source whose address corresponds to the coordinates (b, d) illuminates the microphotograph 20. The corresponding holographic lens 22 then projects the image of the micro-photograph 20 onto the target 13 of the vidicon tube 14. Concurrently with the adresses defined in the illuminated zone of the optical store, the computer 19 supplies to a scanner element 23 a command to scan the image picked up by the target 13. The image must be scanned in accordance with the line of the optical tracks projected onto the target 13; the analyser thus produces an electrical signal 24 which is then amplified by the audio amplifier 25 and supplied to the terminals of a loudspeaker 26.

The acoustic message reconstituted by the loudspeaker 26 will only be intelligible if the number of elementary data and the cadence of transmission of these data, satisfy certain rules fixed by the technique of telecommunications.

In the case of a band of frequencies corresponding to a range of acoustic frequencies extending between 200 and 3000 CIS, and taking a signal-to-noise ratio of the order of 42 dB, the information cadence is close to 47,600 bits per second. On the other hand, it is well known that in current language the mean transmission rate is in the order of 2 words per second. Assuming that each word contains an average of 3 phonemes, the number of bits will be 8,000 per phoneme.

In practice, it is possible to produce holographic lenses which are capable of distinguishing two points which are spaced 3 microns apart. Taking the transverse dimension of a bit as four times this distance, namely 12 microns, this means that on a substrate with a sidelength of 1.08 mm a matrix of (90 X 90) bits can be recorded; this combination produces 8,100 bits. Assuming the light source to be a matrix of electroluminescent gallium arsenide diodes and using as optical electrical analyser a matrix of photo-electric diodes, the time taken by addressing, transmission and readout, is in the order of I00 nanoseconds and this gives a theoretical read out cadence very substantially in 'excess of the aforesaid rate of 47,600 bits per second.

i The small size of the micro image as well as their access time, confer remarkable properties of the store in accordance with the invention.

In the variant embodiment of FIG. 2, described hereinbefore, the matrix of micro-images 11 can be considered as a stage constituting the optical storage plane and the matrix 12 of holographic lenses as a second stage constituting the optical transfer plane.

In accordance with another variant embodiment of the invention, shown in FIG. 3, these two storage and transfer stages are combined into a single stage which performs both functions and is simply constituted by a juxtaposed arrangement of holograms.

The acoustic information is previously processed to form the optical recording of each phoneme; this optical recording then serves to construct a hologram.

A complete store designed in accordance with the invention. contains holograms in the case where the synthesis of the acoustic message is based upon the use of 30 characteristic phonemes. In the case of synthesis by words or by phonetic elements, a store will contain as many holograms as there are words or phonetic elements which it is deemed necessary to obtain correct speech synthesis.

The store 27 in accordance with the invention thus has as many holograms as there are phonemes, distrib uted in the plane x, y in accordance with well defined addresses. A laser source 28 emitting light radiation, can in particular be utilised to selectively illuminate each of these holograms in order to enable the reconstitution of the data stored therein. The light radiation 25 must strike each of the holograms in the optical storage plane 27, at an angle of incidence which is determined by the conditions under which the hologram is recorded. The translational displacement or deflection of the light beam 29, is supplied by the deflector 30. For example, if, during a sequence of synthesising a given message, the phoneme corresponding to the hologram 31 whose coordinates are (x, y) is to be reconstituted, a computer 19 (similar elements carry the same references as in the preceding figure) supplies an address to the control element 32 of the deflector 30. This address determines the orientation of the entering radiation 29 so that said radiation, after passing through the deflector 30, occupies the position 33 of the exit plane thereof, enabling it to correctly illuminate the hologram 31. The recording of the hologram 31, as also of all the holograms recorded in the storage plane 27, ensure the projection of the data carried by the hologram onto the target 13 of an analysis tube 14. Concurrently with the positioning addresses supplied by the computer 19 to the control element 32. of the deflector 30,'

the computer 19 triggers a cycle of operation of the scanner device 23 associated with the optical-electrical analyser 14. In the case ofa vidicon, this scanning is effected by means of two deflection coils 31 and 35.

The electrical output signal 24 resulting from the analysis of the optical input signal, is amplified by the audio amplifier 25 and applied to the terminals of a loudspeaker 26 which reconstitutes it in an audible form.

As a function of the program supplied from the computer 19, all the data stored in the holograms contained in the storage plane 27, are reconstructed in the order corresponding to the spoken message.

The conditions of intelligibility which have been developed previously in the case of the variant embodiment shown in FIG. 2, are still valid in the case of the variant embodiment of FIG. 3.

FIG. 4 schematically illustrates an optical device for constructing holograms for utilisation in the device of FIG. 3. The hologram-constructing device is designed so that the holograms can project a real image into the read-out equipment. FIG. 4 illustrates two variant utilisations which differ from one another by the direction in which it is chosen to apply the reference wave to the photographic emulsion 111.

Coherent radiation issuing from laser source (not shown in FIG. 4) is split by a beam splitter 101 into two beams 102 and 103.

The object beam 102 illuminates a diffracting object 104 constituted in the case of the present invention by a transparency upon which there has previously been recorded in the form of a non-uniform transparency pattern, one of the phonemes utilised for the synthesis of the acoustic messages. The object 104, under the action of the beam 102, emits diffracted radiation which is picked up by a lens 106 designed to project the real image A, B, of the object AB which coincides with the object plane M The light emerging from the lens 106 is received by a portion of the unexposed emulsion 111 carried by the substrate 110. The same portion of the emulsion 111 receives the reference beam 103 via the mirror 126 and the semi-transparent plate 1.07. Under the action of these two received radiation fractions, a pattern of interference fringes is formed at the surface of the emulsion 111 and recorded there. After development of the emulsion 111, a hologram H H is obtained which, when illuminated with a read-out beam having the same characteristics as the beam reflected by the plate 107, projects a real image A, B, onto the plane M,. The same result can be'obtained if the beam 103 is directed onto that face of the emulsion 111 opposite to the one which receives the light emerging from the lens 106; in this case, the elements 107 and 126 are replaced by the mirrors 112 and 113.

lfx, y represent the optical axis of the store described in FIG. 3, the distance d separating the plane M,- from the plane M of the photographic emulsion, corresponds to the distance separating the storage plane 27 from the target plane 13 of the analysis tube 14. Thus, at the time of reconstitution, the hologram H H of FIG. 4 centred in relation to the axis 0 0 will form upon the target of the analysis tube centred in relation to the axis (x, y), a real image which contains all the data carried by the object AB. By a process of recording which is repeated by changing the object and displacing the emulsion 111 and the lens 106, theseries of holograms forming the holographic matrix 27 can be produced.

Each of these holograms is designed for projecting a real image which is in all cases centred on the target of the analysis tube.

H6. 5 schematically illustrates another device for hologram construction, which differs from that of FIG. 4 by the fact that the projected image is a virtual image. To simplify the description, the same references as those used in the preceding figure, have been chosen.

The object 104, taking the form ofa transparent substrate containing the image AB representing the phoneme to be stored, is placed in the plane M and illuminated by a beam 102 derived from the radiant energy 120 produced by a laser source 121 and previously reflected at an auxiliary mirror 122. Another fraction of the radiant energy 120 is reflected by the semitransparent mirror 130 in the form of radiant energy 103 constituting the reference wave which interferes at the plane M with the wave 109 diffracted by the image AB. A transparent substrate 110 carrying a photographic emulsion 111 is placed in this plane. A mask 123 has been provided for limiting the exposure of the emulsion 111 to the zone reserved for formation of the hologram. After the developing of the photographic plate, the hologram H, H; centred in relation to the axis 0, 0 contains all the information carried by the object AB. When correctly illuminated by a reconstruction wave, the hologram thus produces all these data in the form of a virtual image located at the distance D, from the hologram plane.

To the right of the emulsion 111 a lens L has been illustrated which does not form part of the hologram construction device. This lens forms part of the optical system for analysing the images and has been shown in FIG. 5 in order to indicate how the virtual image produced by the hologram on read-out, can be used to form a real image A, B in a plane M the plane M, will in this case be the plane of the target 13 of the opticalelectrical detector 14.

As in the case of the device of FIG. 4, it is possible on the same substrate to produce the number of holograms corresponding to the number of phonemes which are to be stored. In the machine for manufacturing the holograms, it is merely necessary to displace the mask 123 in order to remove a fresh portion of photographic emulsion each time a new object is recorded.

At the time of reconstruction, the virtual images produced by the hologram are all located in the plane M at the distance D, from the hologram plane. The lens L will convert each of these virtual images into a real image centred in relation to the axis (x, y) on the target of the detector.

In FIG. 5, the reference wave 103 is projected onto the photographic emulsion at the same side as the wave carrying the object, but this reference wave 103 could, by the use of a set of auxiliary mirrors not shown in FIG. 5, be made to illuminate the photographic emulsion on the opposite face, as in the case of FIG. 4.

What we claim is:

1. Speech synthesis system for synthetizing acoustic messages made of a succession of elementary acoustic signals under the control of information data including addresses and a command signal for each of said acoustic signals; said system comprising:

holographic storage means including a plurality of elementary fringe patterns arranged side by side on a support;

illuminating means controlled by said addresses for directing toward any one of said fringe patterns a beam of monochromatic radiant energy;

image analyser means including an input face positioned for receiving from any one ofsaid fringe patterns an image of said acoustic signals upon illumination thereof by said beam, a scanner device ensuring scanning of said face under the triggering action of said command signal and an electrical output;

and electro-acoustical means, having an electrical input coupled to said electrical output, for delivering said acoustic messages.

2. Speech synthesis system as claimed in claim 1., wherein said analyser means comprises a vidicon tube.

3. Speech synthesis system for synthetizing acoustic messages made of a succession of elementary acoustic signals under the control of information data including addresses for each of said acoustic signals; said system comprising:

a set of transparencies arranged side by side in rows and columns and respectively representative of said acoustic signals; v

a set of holographic lenses, each said lens respectively facing one said transparency and projecting onto a same common area an image 'of said transparency;

a set of electroluminescent diodes controlled by said addresses, each said diode respectively illuminating one said transparency;

image analyser means including an input face positioned in coincidence with said area for receiving from any one of said transparencies an image of said acoustic signals upon illumination thereof by said diodes, a scanner device ensuring scanning of said face and an electrical output;

and electro-acoustical means, having an electrical input coupled to said electrical output, for delivering said acoustic messages.

4. Speech synthesis system for synthetizing acoustic messages made of a succession of elementary acoustic signals under the control of information data including addresses for each of said acoustic signals; said system comprising:

a coherent source emitting a beam of monochromatic radiant energy; electro-optical deflecting means for deflecting said beam, said deflecting means having a control input for receiving said addresses; a set of elementary holograms arranged side by side ing said acoustic messages. 

1. Speech synthesiS system for synthetizing acoustic messages made of a succession of elementary acoustic signals under the control of information data including addresses and a command signal for each of said acoustic signals; said system comprising: holographic storage means including a plurality of elementary fringe patterns arranged side by side on a support; illuminating means controlled by said addresses for directing toward any one of said fringe patterns a beam of monochromatic radiant energy; image analyser means including an input face positioned for receiving from any one of said fringe patterns an image of said acoustic signals upon illumination thereof by said beam, a scanner device ensuring scanning of said face under the triggering action of said command signal and an electrical output; and electro-acoustical means, having an electrical input coupled to said electrical output, for delivering said acoustic messages.
 2. Speech synthesis system as claimed in claim 1, wherein said analyser means comprises a vidicon tube.
 3. Speech synthesis system for synthetizing acoustic messages made of a succession of elementary acoustic signals under the control of information data including addresses for each of said acoustic signals; said system comprising: a set of transparencies arranged side by side in rows and columns and respectively representative of said acoustic signals; a set of holographic lenses, each said lens respectively facing one said transparency and projecting onto a same common area an image of said transparency; a set of electroluminescent diodes controlled by said addresses, each said diode respectively illuminating one said transparency; image analyser means including an input face positioned in coincidence with said area for receiving from any one of said transparencies an image of said acoustic signals upon illumination thereof by said diodes, a scanner device ensuring scanning of said face and an electrical output; and electro-acoustical means, having an electrical input coupled to said electrical output, for delivering said acoustic messages.
 4. Speech synthesis system for synthetizing acoustic messages made of a succession of elementary acoustic signals under the control of information data including addresses for each of said acoustic signals; said system comprising: a coherent source emitting a beam of monochromatic radiant energy; electro-optical deflecting means for deflecting said beam, said deflecting means having a control input for receiving said addresses; a set of elementary holograms arranged side by side in rows and columns for receiving said deflected beam, each of said holograms, when illuminated by said beam, respectively projecting, directly onto a same common area, a real image of one of said elementary acoustic signals; image analyser means, including an input face positioned in coincidence with said area for receiving said images from any one of said holograms, a scanner device ensuring scanning of said face and an electrical output; and electro-acoustical means, having an electrical input coupled to said electrical output, for delivering said acoustic messages. 